Projects

Blog Posts

Databricks Series, Part 6: ML Serving and Workflows

Batch and real-time model inference, Databricks Model Serving endpoints, and orchestrating the full ML pipeline with Databricks Workflows.

Databricks Series, Part 5: Machine Learning with MLflow

Tracking experiments, logging models and artifacts, comparing runs, and managing the model lifecycle with MLflow on Databricks.

Databricks Series, Part 4: Feature Engineering at Scale

Databricks Feature Store, FeatureEngineeringClient, FeatureLookup, training sets, and eliminating training-serving skew.

Databricks Series, Part 3: Data Ingestion with Auto Loader

cloudFiles format, schema inference, schema evolution, and building robust incremental ingestion pipelines on Databricks.

Databricks Series, Part 2: Lakehouse Architecture

Unity Catalog for governance and discovery, the medallion Bronze/Silver/Gold pattern, and Delta tables as the storage foundation.

Databricks Series, Part 1: Getting Started

Navigating the Databricks workspace, launching clusters, writing notebooks, and submitting your first PySpark job.

Databricks Series, Part 0: Overview

The lakehouse platform concept, what Databricks adds on top of Spark and Delta Lake, and how it compares to alternatives.

Spark Series, Part 4: Performance Tuning

Making Spark jobs fast — partitioning, shuffles, skew, caching, and the most common bottlenecks in production.

Spark Series, Part 3: Structured Streaming

Real-time data processing with Spark Structured Streaming — micro-batches, triggers, watermarks, and output modes.

Spark Series, Part 2: DataFrames and Spark SQL

The practical Spark API — working with structured data using DataFrames, schemas, and SQL queries.

Spark Series, Part 1: RDDs and the Execution Model

Understanding Resilient Distributed Datasets — the foundation of Spark's execution model, transformations, actions, and lazy evaluation.

Spark Series, Part 0: Overview

A high-level introduction to Apache Spark — what it is, why it exists, and where it fits in the modern data stack.

Designing a Data Platform That Doesn't Rot

Lessons from building internal data platforms: what makes them last, what kills them, and the principles I try to apply.